Run-Time Support for Adaptive Load Balancing

نویسندگان

  • Milind A. Bhandarkar
  • Robert Brunner
  • Laxmikant V. Kalé
چکیده

Many parallel scienti c applications have dynamic and irregular computational structure. However, most such applications exhibit persistence of computational load and communication structure. This allows us to embed measurement-based automatic load balancing framework in run-time systems of parallel languages that are used to build such applications. In this paper, we describe such a framework built for the Converse [4] interoperable runtime system. This framework is composed of mechanisms for recording application performance data, a mechanism for object migration, and interfaces for plug-in load balancing strategy objects. Interfaces for strategy objects allow easy implementation of novel load balancing strategies that could use application characteristics on the entire machine, or only a local neighborhood. We present the performance of a few strategies on a synthetic benchmark and also the impact of automatic load balancing on an actual application. 1 Motivation and Related Work An increasing number of emerging parallel applications exhibit dynamic and irregular computational structure. Irregularities may arise from modeling of complex geometries, and use of unstructured meshes, for example, while the dynamic behavior may result from adaptive re nements, and evolution of a physical simulation. Such behavior presents serious performance challenges. Load may be imbalanced to begin with due to irregularities, and imbalances may grow substantially with dynamic changes. We are participating in physical simulation projects at the Computational Science and Engineering centers of University of Illinois (Rocket simulation, and Simulation of Metal Solidi cation), where such behaviors are commonly encountered. Load balancing is a fundamental problem in parallel computing, and a great deal of research has been done in this subject. However, a lot of this research is focussed on improving load balance of particular algorithms or applications. General purpose load balancing research deals mainly with process migration in operating systems and more recently in application frameworks. C++ libraries such as DOME [1] implement the data-parallel programming paradigm as distributed objects and allow migration of work in response to varying load conditions. Systems such as CARMI [10] simply notify the user program of the load imbalance, and leave it to the application process to explicitly move its state to a new processor. Multithreaded systems such as PM [9] require every thread to store its state in the specially allocated memory, so that the system can migrate the thread automatically. An object migration system called ELMO [3], built on top of Charm [6, 7], implements object migration mainly for fault-tolerance. Applications in areas such as VLSI, and Computational Fluid Dynamics (CFD) use graph partitioning programs such as METIS [8] to provide initial load balance. However, every such application has to speci cally provide code for monitoring load imbalance and to invoke the load balancer periodically to deal with dynamic behavior. We have developed an automatic measurement-based load balancing framework to facilitate high-performance implementations of such applications. The framework requires that a computation be partitioned into more pieces (typically implemented as objects) than there are processors, and letting the framework handle the placement of pieces. The framework relies on a \principle of persistence" that holds for most physical simulations: computational load and communication structure of (even dynamic) applications tends to persist over time. For example, even though the load of some object instance changes at adaptive re nement drastically, such events are infrequent, and the load remains relatively stable between such events. The framework can be used to handle application-induced imbalances as well as external imbalances (such as those generated on a timeshared cluster). It cleanly separates runtime data-collection and object migration mechanisms into a distributed database, which allows optional strategies to plug in modularly to decide which objects to migrate where. This paper presents results obtained using our load balancing framework. We brie y describe the framework, then the strategies currently implemented and how they compare on a synthetic benchmark, and nally results on a crack-propagation application implemented using it. 2 Load Balancing Framework Our framework [2] views a parallel application as a collection of computing objects which communicate with each other. Furthermore, these objects are assumed to exhibit temporal correlation in their computation and communication patterns, allowing e ective measurement-based load balancing without application-speci c knowledge. The central component of the framework 1 is the load balancer distributed database, which coordinates load balancing activities. Whenever a method of a particular object runs, the time consumed by that object is recorded. Furthermore, whenever objects communicate, the database records information about the communication. This allows the database to form an object-communication graph, in which each node represents an object, with the computation time of that object as a weight, and each arc is a communication pathway representing communication from one object to another object, recording number of messages and total volume of communication for each arc. The design of Charm++ [5] o ers several advantages for this kind of load balancing. First, parallel programs are composed of many coarse-grained objects, which represent convenient units of work for migration. Also, messages are directed to particular objects, not processors, so an object may be moved to a new location without informing other objects about the change; the run-time system handles the message delivery with forwarding. Furthermore, the message-driven design of Charm++ means that work is triggered by messages, which are dispatched by the run-time system. Therefore, the run-time knows which object is running at any particular time, so the CPU time and message tra c for each object can be deposited with the framework. Finally, the encapsulation of data within objects simpli es object migration. However, the load balancing framework is not limited to Charm++ only. Any language implemented on top of Converse can utilize this framework. For this purpose, the framework does not interact with object instances directly. Instead, interaction between objects and the load balancing framework occurs through object managers. Object managers are parallel objects (with one instance on each processor) that are supplied by the language runtime system. Object managers are responsible for creation, destruction, and migration of language-speci c objects. They also supply the load database coordinator with computational loads and communication information of the objects they manage. Object managers register the managed objects with the framework, and are responsible for mapping the framework-assigned system-wide unique object identi er to the language-speci c identi er (such as thread-id in multithreaded systems, chare-id in Charm++, processor number in MPI etc.) We have ported a CFD application written using Fortran 90 and MPI with minimal changes to use our framework using MPI library called ArrayMPI on top of the Converse runtime system. The ArrayMPI library allows an MPI program to create a number of virtual processors, implemented as Converse threads, which are mapped by the runtime system to available physical processors. The application program built using this MPI library then executes as if there are as many physical processors in the system as these virtual processors. The LB framework keeps track of computational load and communication graph of these virtual processors. Periodically, the MPI application transfers control to the load balancer using a special call MPI Migrate, which allows the framework to invoke a load balancing strategy and to re-map these virtual processors to physical processors thus maintaining load balance. 3 Load Balancing Strategies Load balancing strategies are a separate component of the framework. By separating the data collection code common to all strategies, we have simpli ed the development of novel strategies. For e ciency, each processor collects only a portion of the object-communication graph, that is, only the parts concerning

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems

Networks of workstations ooer inexpensive and highly available high performance computing environments. A critical issue for achieving good performance in any parallel system is load balancing, even more so in workstation environments where the machines might be shared among many users. In this paper, we present and evaluate a system that combines compiler and run-time support to achieve load b...

متن کامل

On Partitioning Dynamic Adaptive Grid Hierarchies

This paper presents a computationally efficient runtime partitioning and load-balancing scheme for the Distributed Adaptive Grid Hierarchies that underlie adaptive mesh-refinement methods. The partitioning scheme yields an efficient parallel computational structure that maintains locality to reduce communications. Further, it enables dynamic re-partitioning and loadbalancing of the adaptive gri...

متن کامل

Supporting Adaptivity in MPI for Dynamic Parallel Applications

The new generation of parallel applications are complex, involve simulation of dynamically varying systems, and use adaptive techniques such as multiple timestepping and adaptive refinements. Typical implementations of the MPI do not support the dynamic nature of these applications well. As a result, programming productivity and parallel efficiency suffer. In this paper, we present Adaptive MPI...

متن کامل

Task Partitioning and Load Balancing Strategy for Matrix Applications on Distributed System

In this paper, we present a load-balancing strategy (Adaptive Load Balancing strategy) for data parallel applications to balance the work load effectively on a distributed system. We study its impact on computation-hungry matrix multiplication application. The ALB strategy enhances the performance with features such as intelligent node selection, pre-task assignment, adaptive task sizing and bu...

متن کامل

Flexible load balancing software for parallel applications in a time-sharing environment

Networks of workstations become more and more appropriate for parallel applications, as modern network technology enables high quality communication between powerful workstations. In this perspective, load balancing software must be extremely exible as the set of available nodes for a particular distributed memory application may change at run time. XENOOPS is an advanced environment for parall...

متن کامل

Dynamic Load Balancing for Parallel Finite Element Methods with Adaptive h-and p-Refinement

We describe a dynamic load-balancing strategy for parallel finite element methods with adaptive mesh (h-) and order (p-) refinement. The load-balancing algorithm is based on the tiling load-balancing system, where global balance is achieved by performing local balancing within overlapping neighborhoods of processors. Tiling is applied to each mesh level created by the adaptive h-refinement. Wei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000